{
"cells": [
{
"cell_type": "markdown",
"id": "6d6e66b9-2dcc-4f1d-97ab-0393dbb18afe",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"## 2 - The General Linear Model\n",
"### Getting to grips with linear models in Python"
]
},
{
"cell_type": "markdown",
"id": "5981ad66-0254-4287-ae75-a4665c3102de",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"This week we will focus on three things:\n",
"- How to do basic, psychology-standard analyses in Python using the `pingouin` package\n",
"- How to implement a general linear model in Python with `statsmodels`\n",
"- and understanding the connection between the two with some examples.\n",
"\n",
"Don't worry if this confusing. It takes many repetitions and practice to get it to stick!"
]
},
{
"cell_type": "markdown",
"id": "6ef352ef-4faa-4eb2-8a0a-e238affe6c4f",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"### Part 1 - The `pingouin` package for basic statistics\n",
"You will be familiar with basic statistics in psychology at this point, and it is very useful to know how to do them in Python. The `pingouin` package covers most of these, and it works seamlessly with `pandas`. As before we need to import it, so we will call our needed packages here."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "0cb0f8ca-83c4-4e98-9c52-fdd8d604ba5e",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [],
"source": [
"# Import what we need\n",
"import pandas as pd # dataframes\n",
"import seaborn as sns # plots\n",
"import pingouin as pg # stats, note the traditional alias\n",
"\n",
"# Set the style for plots\n",
"sns.set_style('dark') # a different theme!\n",
"sns.set_context('talk')"
]
},
{
"cell_type": "markdown",
"id": "cb640d9f-eaa2-4fc8-814a-5752b41c20c2",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "fragment"
},
"tags": []
},
"source": [
"Lets see how this package allows us to do some basic psychology-style analyses, with some example datasets that we can load from `seaborn`."
]
},
{
"cell_type": "markdown",
"id": "5a609664-abf1-4cd9-adee-8f277ea067e7",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"source": [
"#### The **independent** samples t-test\n",
"This t-test is often the first analysis we learn. With `pingouin`, it is handled by the `pg.ttest` function. Lets load up the `tips` dataset and do a t-test.\n",
"\n",
"**Test**: Do smokers tip more than non-smokers? To test this we need to give the function just the values for each group, which we get by filtering our data."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "68d19d7c-3275-4f0f-b302-5216264343ea",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
total_bill
\n",
"
tip
\n",
"
sex
\n",
"
smoker
\n",
"
day
\n",
"
time
\n",
"
size
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
16.99
\n",
"
1.01
\n",
"
Female
\n",
"
No
\n",
"
Sun
\n",
"
Dinner
\n",
"
2
\n",
"
\n",
"
\n",
"
1
\n",
"
10.34
\n",
"
1.66
\n",
"
Male
\n",
"
No
\n",
"
Sun
\n",
"
Dinner
\n",
"
3
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" total_bill tip sex smoker day time size\n",
"0 16.99 1.01 Female No Sun Dinner 2\n",
"1 10.34 1.66 Male No Sun Dinner 3"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# First load the data\n",
"tips = sns.load_dataset('tips')\n",
"display(tips.head(2))"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "3e4e7b04-d4c0-45af-bb0d-97f3ea38a5a1",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
T
\n",
"
dof
\n",
"
alternative
\n",
"
p-val
\n",
"
CI95%
\n",
"
cohen-d
\n",
"
BF10
\n",
"
power
\n",
"
\n",
" \n",
" \n",
"
\n",
"
T-test
\n",
"
0.0918
\n",
"
192.2634
\n",
"
two-sided
\n",
"
0.9269
\n",
"
[-0.35, 0.38]
\n",
"
0.0122
\n",
"
0.145
\n",
"
0.051
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" T dof alternative p-val CI95% cohen-d BF10 \\\n",
"T-test 0.0918 192.2634 two-sided 0.9269 [-0.35, 0.38] 0.0122 0.145 \n",
"\n",
" power \n",
"T-test 0.051 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# First filter the data by smokers and nonsmokers, and select the tip column\n",
"smokers = tips.query('smoker == \"Yes\"')['tip']\n",
"nonsmokers = tips.query('smoker == \"No\"')['tip']\n",
"\n",
"# Pass to the t-test function\n",
"pg.ttest(smokers, nonsmokers).round(4)"
]
},
{
"cell_type": "markdown",
"id": "19c89a5f-e5b8-4c0f-82bd-764a3df6cf23",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"source": [
"#### The **paired** samples t-test \n",
"Comparing the means of two variables is achieved with the same function, but telling it that the two variables are paired. Lets compare the mean bill price with the mean tip price - meals should be on average more expensive than their tips!"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "54d2e488-0eb5-4d64-baa6-ed0671eac15a",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
T
\n",
"
dof
\n",
"
alternative
\n",
"
p-val
\n",
"
CI95%
\n",
"
cohen-d
\n",
"
BF10
\n",
"
power
\n",
"
\n",
" \n",
" \n",
"
\n",
"
T-test
\n",
"
32.6465
\n",
"
243
\n",
"
two-sided
\n",
"
0.0
\n",
"
[15.77, 17.8]
\n",
"
2.6352
\n",
"
1.222e+87
\n",
"
1.0
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" T dof alternative p-val CI95% cohen-d BF10 \\\n",
"T-test 32.6465 243 two-sided 0.0 [15.77, 17.8] 2.6352 1.222e+87 \n",
"\n",
" power \n",
"T-test 1.0 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Select the columns\n",
"bills = tips['total_bill']\n",
"tipamount = tips['tip']\n",
"\n",
"# Do the test\n",
"pg.ttest(bills, tipamount, paired=True).round(4)"
]
},
{
"cell_type": "markdown",
"id": "cc893df3-f81d-46ef-9b37-02a8fd994f2b",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "fragment"
},
"tags": []
},
"source": [
"One-sample tests are also easily achieved by stating the target value as the second input. For example, to compare the mean tip amount to $1, we do `pg.ttest(tipamount, 1)`."
]
},
{
"cell_type": "markdown",
"id": "ccd03bd9-0aeb-4267-8e4c-9115f46947bc",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"source": [
"#### Correlation\n",
"This useful statistic is a special case of the general linear model as we will see, but it is computed easily enough with `pg.corr`.\n",
"\n",
"Let us correlate the tip amount and total bill amount:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "cd2328d7-842d-41a0-99cf-e32aabedb4dd",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
n
\n",
"
r
\n",
"
CI95%
\n",
"
p-val
\n",
"
BF10
\n",
"
power
\n",
"
\n",
" \n",
" \n",
"
\n",
"
pearson
\n",
"
244
\n",
"
0.675734
\n",
"
[0.6, 0.74]
\n",
"
6.692471e-34
\n",
"
4.952e+30
\n",
"
1.0
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" n r CI95% p-val BF10 power\n",
"pearson 244 0.675734 [0.6, 0.74] 6.692471e-34 4.952e+30 1.0"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Correlation\n",
"pg.corr(tips['total_bill'], tips['tip'])"
]
},
{
"cell_type": "markdown",
"id": "42e2df7f-0249-4018-9902-1323c9bf1035",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"source": [
"#### ANOVA\n",
"The analysis of variance (more about the difference in means, actually) comes in many forms - one way, two way, mixed, repeated, analysis of covariance, and so on. We need not dwell on these confusing definitions, but it is good to demonstrate how they are handled."
]
},
{
"cell_type": "markdown",
"id": "fb990371-444f-4a65-b780-82e5ae923c5b",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"source": [
"#### One-way ANOVA\n",
"Designed for testing mean differences in a continuous outcome under two or more groups, in the `pg.anova` function. Let's examine whether the amount of tips vary over different days - the `tips` dataset has Thursday-Sunday as days. A plot will help us."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "174a7301-ab45-4925-bb81-196593993c59",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlEAAAHHCAYAAACfqw0dAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/H5lhTAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA4rklEQVR4nO3deVyVZd7H8e8BJMAlFHAB5lFTRJswUUsZJ1Ecm3SyxyVxyxZrzAm1Mmt08snyqWzTGhc0zRZMc0FJtGnMBX1CfemLMnVQRI+aK4gLLgFxOJ7nD1+cIlZvtnM4n/dfnvu67uv87m6Dr9d9neuYbDabTQAAALglbrVdAAAAgDMiRAEAABhAiAIAADCAEAUAAGAAIQoAAMAAQhQAAIABhCgAAAADCFEAAAAGEKIAAAAM8KjtAlyBzWZTQcGN2i4DAABUgIeHm0wmU/n9aqAWl1dQcEPZ2Tm1XQYAAKgAX18f1avnXm4/HucBAAAYQIgCAAAwgBAFAABgACEKAADAAEIUAACAAYQoAAAAAwhRAAAABhCiAAAADCBEAQAAGECIAgAAMIAQBQAAYAAhCgAAwACn+QLir7/+WsuWLVNqaqpsNpt+97vfqX///nriiSfk5eVVoTHy8vLUuXNnWa3WUvvEx8crLCysqsoGAAB1lFOEqLlz52revHlyd3dXly5d1KBBA+3fv18ffPCBvvrqKy1btky33357ueOkpaXJarUqMDBQXbp0KbGPr69vFVcPAIDzKSgo0Jo1K3X0aLratm2nIUOGycPDKWJDjXH4/xopKSmaN2+eGjVqpKVLl6p9+/aSpJycHE2cOFHffvutPvjgA02fPr3csVJTUyVJAwYM0KRJk6q1bgAAnNmaNSuVkLBaknTgwD5J0rBho2qzJIfj8GuiEhISJEl//etf7QFKknx8fDRx4kRJ0rZt2yo0VmGI4nEdAABl27v3uzJfwwlmol577TWNGTNGAQEBxdoK1za5u7tXaCxCFAAAFZOf/3OZr+EEIcrDw0Nt2rQpdvzcuXN6++23JUmDBw8ud5z8/HyZzWb5+vpq165dWrlypY4ePSqbzaaOHTvqqaeeUo8ePaq8fgAAUDc5/OO833rrrbc0YsQI9enTRwcOHNCYMWM0bty4cs9LS0uTxWJRdna2pk6dKknq1q2b/P39tXPnTo0ZM0aLFi2q7vIBAEAd4fAzUb+1Zs0aXb16VZLk6emprKwsXbhwQU2bNi3zvIMHD0qSmjZtqtjY2CKP9BISEvTyyy9r9uzZCg8P1z333FN9FwAAAOoEp5uJSkxM1L59+7R69Wp17txZ69ev14gRI5STk1PmedHR0UpKSipxH6hBgwZp5MiRstlsiouLq87yAQBAHeF0IapFixby8vJSx44dtXjxYrVr106nT5/WqlWryjzPzc1NgYGBatasWYntffr0kSQdOHCgymsGAAB1j9OFqF/z9PRUv379JP3yuM6o5s2bS5Jyc3MrXRcAAKj7HD5EzZkzR88995wyMjJKbPf09JR0c2fVssTGxmrixInatWtXie2F4xeGKQAAgLI4fIjasWOHvv76a3311Vcltm/fvl1S+Xs/HT9+XBs3brRv3vlbhcd79eplvFgAAOAyHD5EjRp1c4v5efPmaf/+/fbjFotF7733nvbs2SM/Pz8NGTLEftxsNstsNstisdj7jxw5UiaTSYmJiUpMTCzyHnFxcVq3bp18fX316KOP1sBVAQAAZ+fwWxw89NBDSklJ0cqVKzVs2DCFh4erUaNGOnTokDIyMuTr66sFCxaoUaNGkqTMzEz1799fkrRlyxYFBwdLksLDwzVp0iTNmjVLL774opYsWaKWLVvqyJEjOnbsmHx8fDR//nz5+fnV2rUCAADn4fAhSpJmzJih7t2764svvlBqaqry8/MVGBioxx57TE8++WSpn7j7rbFjxyosLEyffPKJ9u3bJ7PZrICAAEVHR2vcuHEKCgqq5isBAAB1hclms9lqu4i6zmKxKju77H2sAABwJJMmxejs2TP214GBQZo9e34tVlRzfH19VK9e+d/L6/BrogAAABwRIQoAAMAAQhQAAIABhCgAAAADCFEAAAAGOMUWBwCA2ldQUKA1a1bq6NF0tW3bTkOGDJOHB79G4Lr42w8AqJA1a1YqIWG1JOnAgX2SpGHDRtVmSUCt4nEeAKBC9u79rszXgKshRAEAKiQ//+cyXwOuhhAFAABgACEKAADAAEIUAACAAYQoAAAAAwhRAAAABrBPFAAAVSwgoGFtl1Bp7u5uxV4783VlZV2r8jGZiQIAADCAmSgAAKrJn15dpdz8gtouwxDPrKtFZlp+zLqqHv9YXmv1GOHt6aHNr0ZX2/iEKAAAqklufoHyLM4ZourZbEVe22w2p72W6sLjPAAAAAMIUQAAAAYQogAAAAxgTRQAwwoKCrRmzUodPZqutm3baciQYfLw4McKANfATzsAhq1Zs1IJCaslSQcO7JMkDRs2qjZLAoAaw+M8AIbt3ftdma8BoC4jRAEwLD//5zJfA0BdRogCAAAwgBAFAABgACEKAADAAEIUAACAAWxxgCrFvkEAAFfBbzdUKfYNAgC4Ch7noUqxbxAAwFU4zUzU119/rWXLlik1NVU2m02/+93v1L9/fz3xxBPy8vKq8DiZmZmKjY3Vzp07lZGRIX9/f0VFRSkmJkZNmjSpxitwDewbBABwFU4xEzV37lw999xz+v7773XXXXcpIiJCly5d0gcffKCHH35YV65cqdA4p06d0pAhQ7RixQp5eXmpd+/ecnd31+eff65BgwYpIyOjmq8EAADUFQ4folJSUjRv3jw1atRIa9eu1dKlS7VgwQJt2rRJ9913n44cOaIPPvigQmNNmTJFWVlZiomJ0fr16zVnzhxt3LhRw4cPV0ZGhqZPn169FwMAAOoMhw9RCQkJkqS//vWvat++vf24j4+PJk6cKEnatm1bueOkpKQoJSVFrVq10vjx4+3H3d3dNW3aNAUGBmrbtm06evRo1V4AAACokxw+RL322mv617/+peHDhxdrs1qtkm4GofJs3bpVktSnTx+5uRW97Hr16ikqKkqStGXLlsqWDAAAXIDDLyz38PBQmzZtih0/d+6c3n77bUnS4MGDyx0nPT1dktSuXbsS29u2bStJSktLM1oqAABwIQ4fon7rrbfe0r59+7Rv3z6ZTCaNGTNG48aNK/e88+fPS5KaNWtWYnvTpk2L9AMAACiL04WoNWvW6OrVq5IkT09PZWVl6cKFC/YQVJqcnBxJkre3d4nthdskFPYDAAAoi9OFqMTERDVu3Fjp6emaNWuW1q9fr71792r9+vXy8fEp9bzCdVMmk6nM8W02W5XWCwCSFBDQsLZLqDR3d7dir535urKyrtV2CQ7N5uZR5ms4wcLy32rRooW8vLzUsWNHLV68WO3atdPp06e1atWqMs+rX7++JCk3N7fE9ry8PEmlz1QBAOBKCm4PLvM1nHAm6tc8PT3Vr18/paen6+DBg2X2bdq0qVJTU5WVlVVie+FaqPIeCwJAZfzp1VXKzS+o7TIM8cy6WuRf3j9mXVWPfyyvtXqM8Pb00OZXo2u7DKeQ1+JuSZL7T1my1g+wv8YvHD5EzZkzR8eOHdOUKVPUvHnzYu2enp6SpIKCsn8ohYaGKikpqdR9oAqPh4aGVrJiAChdbn6B8izOGaLq/Wa5g81mc9prQQWY3JQXGF7bVTg0h3+ct2PHDn399df66quvSmzfvn27JCksLKzMcSIjIyVJmzZt0o0bN4q0WSwW+/5QvXv3rmzJAADABTh8iBo1apQkad68edq/f7/9uMVi0Xvvvac9e/bIz89PQ4YMsR83m80ym82yWCz2/p07d1ZYWJjMZrNmz55tX0ButVr1xhtv6Ny5c+rZs6c6dOhQg1cHAACclcM/znvooYeUkpKilStXatiwYQoPD1ejRo106NAhZWRkyNfXVwsWLFCjRo0kSZmZmerfv7+km7uPBwf/shBu5syZeuSRR7R48WJt2bJFISEhOnTokE6ePKmgoCC9/vrrtXKNAADA+Tj8TJQkzZgxQ++//766du2qtLQ0JScn67bbbtNjjz2mxMRE3X13xRa7hYSEaO3atRo8eLCuXbumpKQkSdLo0aO1atWqUjfiBAAA+C2Hn4kq1L9/f/sMU1mCg4N1+PDhUtuDgoI0c+bMqiwNAAC4IKeYiQIAAHA0hCgAAAADCFEAAAAGEKIAAAAMIEQBAAAYQIgCAAAwgBAFAABgACEKAADAAEIUAACAAYQoAAAAAwhRAAAABhCiAAAADCBEAQAAGOBR2wXgFwEBDWu7hEpzd3cr9tqZrysr61ptlwAAcFDMRAEAABjATJQD+tOrq5SbX1DbZRjimXW1SDL/Meuqevxjea3VY4S3p4c2vxpd22UAABwcIcoB5eYXKM/inCGqns1W5LXNZnPaa6luzvyYsxCPbwG4Mh7nAQAAGMBMFFDLeHxbu3h8C8AoQhRQy3h8CwDOicd5AAAABhCiAAAADCBEAQAAGECIAgAAMIAQBQAAYAAhCgBQITY3jzJfA66GEAUAqJCC24PLfA24Gv4ZAQCokLwWd0uS3H/KkrV+gP014KoIUQCAijG5KS8wvLarABwGj/MAAAAMIEQBAAAYQIgCAAAwgBAFAABggNMsLF+3bp3i4+OVlpam3Nxc+fn5qXv37ho7dqzatGlToTHy8vLUuXNnWa3WUvvEx8crLCysqsoGAAB1lMOHKJvNpsmTJ2vDhg3y8PBQWFiYmjRporS0NH355Zf697//rfnz5+uPf/xjuWOlpaXJarUqMDBQXbp0KbGPr69vFV8BAACoixw+RCUmJmrDhg0KCAjQRx99pPbt20uSrFar5syZo4ULF+qll17Spk2bVL9+/TLHSk1NlSQNGDBAkyZNqvbaAQBA3eXwa6Li4+MlSS+88II9QEmSu7u7nnvuOYWEhOjixYvasWNHuWMVhige1wEAgMpy+BDVqFEjtWnTRl27di3WZjKZ1Lp1a0lSZmZmuWMRogAAQFVx+Md58+fPL7XNarXag1GLFi3KHCc/P19ms1m+vr7atWuXVq5cqaNHj8pms6ljx4566qmn1KNHjyqtHQAA1F0OPxNVluXLl+vMmTPy9fVVREREmX3T0tJksViUnZ2tqVOnSpK6desmf39/7dy5U2PGjNGiRYtqomwAAFAHOPxMVGl27dqld955R5I0efLkcheVHzx4UJLUtGlTxcbGFnmkl5CQoJdfflmzZ89WeHi47rnnnuorHAAA1AlOOROVlJSkcePGKT8/XyNGjNDQoUPLPSc6OlpJSUkl7gM1aNAgjRw5UjabTXFxcdVVNgAAqEOcLkQtXbpUMTExysvL06hRozR9+vQKnefm5qbAwEA1a9asxPY+ffpIkg4cOFBltboim5tHma8BAKgrnOY3XEFBgWbMmKGVK1fKZDLp+eef17hx46ps/ObNm0uScnNzq2xMV1Rwe7A8ci8VeQ0AQF3kFCEqLy9PMTExSk5Olre3t9566y098MADtzRGbGys0tLSNGLEiBIXoWdkZEj6JUzBmLwWd0uS3H/KkrV+gP01AAB1jcOHKKvVag9Qfn5+WrhwoTp27HjL4xw/flwbN26Ul5dXiSEqISFBktSrV6/KluzaTG7KCwyv7SoAAKh2Dr8masGCBUpOTpaPj48+++yzcgOUxWKR2WyW2WyWxWKxHx85cqRMJpMSExOVmJhY5Jy4uDitW7dOvr6+evTRR6vlOgAAQN3i0DNRV65c0ZIlSyTd3Jrgww8/LLXvgAEDFBkZqczMTPXv31+StGXLFgUH31yTEx4erkmTJmnWrFl68cUXtWTJErVs2VJHjhzRsWPH5OPjo/nz58vPz6/6LwwAADg9hw5Re/bsUU5OjiTpxIkTOnHiRKl9O3TooMjIyDLHGzt2rMLCwvTJJ59o3759MpvNCggIUHR0tMaNG6egoKCqLB8AANRhDh2i+vbtq8OHD9/SOcHBwWWeExERUe7u5gAAAOVx+DVRAAAAjogQBQAAYAAhCgAAwABCFAAAgAGEKAAAAAMIUQAAAAYQogAAAAwgRAEAABhAiAIAADCAEAUAAGAAIQoAAMAAQhQAAIABhCgAhtncPMp8DQB1GSEKgGEFtweX+RoA6jL+2QjAsLwWd0uS3H/KkrV+gP01ALgCQhQA40xuygsMr+0qAKBW8DgPAADAAEIUAACAAYQoAAAAAwhRAAAABhCiAAAADCBEAQAAGECIAgAAMIAQBQAAYAAhCgAAwABCFAAAgAGEKAAAAAMIUQAAAAYQogAAAAwgRAEAABhAiAIAADCAEAUAAGAAIQoAAMAAj9ouoKLWrVun+Ph4paWlKTc3V35+furevbvGjh2rNm3aVHiczMxMxcbGaufOncrIyJC/v7+ioqIUExOjJk2aVOMVAACAusThZ6JsNpteeOEFvfTSS/r+++/Vpk0b9ezZU+7u7vryyy81ePBgJScnV2isU6dOaciQIVqxYoW8vLzUu3dvubu76/PPP9egQYOUkZFRzVcDAADqCocPUYmJidqwYYMCAgK0Zs0arVixQrGxsdq0aZPGjRunvLw8vfTSS/rpp5/KHWvKlCnKyspSTEyM1q9frzlz5mjjxo0aPny4MjIyNH369Bq4IgAAUBc4fIiKj4+XJL3wwgtq3769/bi7u7uee+45hYSE6OLFi9qxY0eZ46SkpCglJUWtWrXS+PHji4wzbdo0BQYGatu2bTp69Gj1XAgAAKhTHD5ENWrUSG3atFHXrl2LtZlMJrVu3VrSzbVOZdm6daskqU+fPnJzK3rZ9erVU1RUlCRpy5YtVVE2AACo4xx+Yfn8+fNLbbNarUpNTZUktWjRosxx0tPTJUnt2rUrsb1t27aSpLS0NCNlAgAAF+PwM1FlWb58uc6cOSNfX19FRESU2ff8+fOSpGbNmpXY3rRp0yL9AAAAyuK0IWrXrl165513JEmTJ09W/fr1y+yfk5MjSfL29i6x3cvLq0g/AACAsjhliEpKStK4ceOUn5+vESNGaOjQoeWe4+7uLunmOqqy2Gy2KqkRAADUbU4XopYuXaqYmBjl5eVp1KhRFd6WoHCmKjc3t8T2vLw8SaXPVAEAAPyawy8sL1RQUKAZM2Zo5cqVMplMev755zVu3LgKn9+0aVOlpqYqKyurxPbCtVCFa6MAAADKUmUhqqCgQLt379aJEyd09epV+fn5qW3bturcuXOlx87Ly1NMTIySk5Pl7e2tt956Sw888MAtjREaGqqkpKRS94EqPB4aGlrpegEAQN1X6RBVUFCgjz/+WIsXL9b169eLtTdt2lTPP/+8Bg4caGh8q9VqD1B+fn5auHChOnbseMvjREZGauHChdq0aZOeffbZIntFWSwW+/5QvXv3NlQnAABwLZVaE2Wz2TRp0iS9//77unbtmjw9PRUaGqrw8HC1bdtWHh4eyszM1NSpU+2fpLtVCxYsUHJysnx8fPTZZ5+VG6AsFovMZrPMZrMsFov9eOfOnRUWFiaz2azZs2fbF5BbrVa98cYbOnfunHr27KkOHToYqhMAALiWSs1EJSQk6JtvvpG3t7emTp2qgQMHytPT096el5en+Ph4vffee/rkk0/0xz/+UX/4wx8qPP6VK1e0ZMkSSTdntD788MNS+w4YMECRkZHKzMxU//79Jd3cfTw4ONjeZ+bMmXrkkUe0ePFibdmyRSEhITp06JBOnjypoKAgvf7667f6nwAAALioSoWoVatWyWQy6f3331evXr2KtXt5eemRRx6Rn5+fnn/+ecXFxd1SiNqzZ49936YTJ07oxIkTpfbt0KGDIiMjyxwvJCREa9eu1bx58/Ttt98qKSlJzZs31+jRozVu3Dj5+/tXuDYAAODaKhWizGazgoODSwxQv9avXz+9++672r9//y2N37dvXx0+fPiWzgkODi7znKCgIM2cOfOWxgQAAPitSu8T1bBhwwr1a9KkSal7NAEAADibSoWosLAwpaen68yZM2X2y87O1pEjR3TXXXdV5u0AAAAcRqVC1MSJEyVJEyZM0IULF0rs8/PPP2vKlCmyWCx65plnKvN2AAAADqNSa6IuXryooUOH6osvvtADDzyg/v37KywsTL6+vsrJydGRI0f01VdfKSMjQ23bttWePXu0Z8+eYuM8++yzlSkDAACgxlUqRMXExNi/0Pf69etavXq1Vq9eXaRP4X5MR48eLbZbuM1mk8lkIkQBAACnU6kQdc8991RVHQAAAE6lUiFq6dKlVVUHAACAU6n0FgcAAACuiBAFAABgQIUf5/Xq1Usmk0lxcXH63e9+Zz92K0wmk5KSkm7pHAAAAEdU4RCVkZEhk8mkgoKCIsduReEn+QAAAJxdhUNU4ffNBQQEFDsGAADgaiocogYNGlTs2JkzZxQYGKjBgweXe/6CBQt07NixEscBAABwNpVaWD5v3jytXbu2Qn03bdqkzZs3V+btAAAAHEaFZ6LOnDmjXbt2FTuelZWl+Pj4Us+z2Ww6e/as0tPT5ePjY6xKAAAAB1PhEOXn56e5c+fq/Pnz9mMmk0knT57U//zP/5R7vs1mU0REhLEqAQAAHEyFQ5SXl5cmT56s999/337s7Nmz8vT0lL+/f6nnubm5ycfHR3feeadeeumlylULAADgIG7pa18GDBigAQMG2F+3b99eYWFhWrZsWZUXBgAA4Mgq9d1548ePV4sWLaqqFgAAAKdR6RAFAADgivjuPAAAAAMIUQAAAAYQogAAAAwgRAEAABhAiAIAADCAEAUAAGAAIQoAAMAAQhQAAIABhCgAAAADCFEAAAAGEKIAAAAMIEQBAAAYUKkvIK4tJ06c0MCBAzV48GC98sorFT4vLy9PnTt3ltVqLbVPfHy8wsLCqqJMAABQhzldiLpw4YKeeeYZ5ebm3vK5aWlpslqtCgwMVJcuXUrs4+vrW8kKAQCAK3CqEHXo0CE9++yz+vHHHw2dn5qaKkkaMGCAJk2aVJWlAQAAF+MUIerKlStatGiR4uLilJ+fr+DgYJ0+ffqWxykMUTyuAwAAleUUC8vj4uL00UcfqUmTJlqwYIEGDhxoaBxCFAAAqCpOMRPVvHlz/f3vf9fIkSPl5eVlD0O3Ij8/X2azWb6+vtq1a5dWrlypo0ePymazqWPHjnrqqafUo0ePaqgeAADURU4xEzV06FCNGTNGXl5ehsdIS0uTxWJRdna2pk6dKknq1q2b/P39tXPnTo0ZM0aLFi2qqpIBAEAd5xQzUVXh4MGDkqSmTZsqNja2yCO9hIQEvfzyy5o9e7bCw8N1zz331FaZAADASbhMiIqOjlbPnj3l7u6uZs2aFWkbNGiQUlNTtXTpUsXFxRGiAABAuZzicV5VcHNzU2BgYLEAVahPnz6SpAMHDtRkWQAAwEm5TIgqT/PmzSXJ0CaeAADA9bhMiIqNjdXEiRO1a9euEtszMjIk/RKmAAAAyuIya6KOHz+ujRs3ysvLSxEREcXaExISJEm9evWq4coAAIAzqnMzURaLRWazWWazWRaLxX585MiRMplMSkxMVGJiYpFz4uLitG7dOvn6+urRRx+t6ZIBAIATqnMzUZmZmerfv78kacuWLQoODpYkhYeHa9KkSZo1a5ZefPFFLVmyRC1bttSRI0d07Ngx+fj4aP78+fLz86vN8gEAgJOocyGqLGPHjlVYWJg++eQT7du3T2azWQEBAYqOjta4ceMUFBRU2yUCAAAn4ZQhasKECZowYUKJbcHBwTp8+HCp50ZERJS4JgoAAOBW1Lk1UQAAADWBEAUAAGAAIQoAAMAAQhQAAIABhCgAAAADCFEAAAAGEKIAAAAMIEQBAAAYQIgCAAAwgBAFAABgACEKAADAAEIUAACAAYQoAAAAAwhRAAAABhCiAAAADCBEAQAAGECIAgAAMIAQBQAAYAAhCgAAwABCFAAAgAGEKAAAAAMIUQAAAAYQogAAAAwgRAEAABhAiAIAADCAEAUAAGAAIQoAAMAAQhQAAIABhCgAAAADCFEAAAAGEKIAAAAMIEQBAAAYQIgCAAAwwClD1IkTJ9SpUyfNmDHjls/NzMzU9OnT1bdvX4WFhal379763//9X126dKkaKgUAAHWV04WoCxcu6JlnnlFubu4tn3vq1CkNGTJEK1askJeXl3r37i13d3d9/vnnGjRokDIyMqqhYgAAUBc5VYg6dOiQRo4cKbPZbOj8KVOmKCsrSzExMVq/fr3mzJmjjRs3avjw4crIyND06dOruGIAAFBXOUWIunLlit59911FR0frxx9/VHBw8C2PkZKSopSUFLVq1Urjx4+3H3d3d9e0adMUGBiobdu26ejRo1VZOgAAqKOcIkTFxcXpo48+UpMmTbRgwQINHDjwlsfYunWrJKlPnz5ycyt62fXq1VNUVJQkacuWLZWuFwAA1H1OEaKaN2+uv//979q4caM97Nyq9PR0SVK7du1KbG/btq0kKS0tzViRAADApXjUdgEVMXTo0EqPcf78eUlSs2bNSmxv2rRpkX4AAABlcYqZqKqQk5MjSfL29i6x3cvLq0g/AACAsrhMiHJ3d5ckmUymMvvZbLaaKAcAADg5lwlR9evXl6RS95fKy8uTVPpMFQAAwK+5TIgqXPOUlZVVYnvhWqjCfgAAAGVxmRAVGhoqSaXuA1V4vLAfAABAWVwmREVGRkqSNm3apBs3bhRps1gs9v2hevfuXeO1AQAA51PnQpTFYpHZbJbZbJbFYrEf79y5s8LCwmQ2mzV79mz7AnKr1ao33nhD586dU8+ePdWhQ4faKh0AADgRp9gn6lZkZmaqf//+km7uPv7rr4iZOXOmHnnkES1evFhbtmxRSEiIDh06pJMnTyooKEivv/56bZUNAACcTJ2biSpLSEiI1q5dq8GDB+vatWtKSkqSJI0ePVqrVq0qdSNOAACA33LKmagJEyZowoQJJbYFBwfr8OHDpZ4bFBSkmTNnVldpAADARbjUTBQAAEBVIUQBAAAYQIgCAAAwgBAFAABgACEKAADAAEIUAACAAYQoAAAAAwhRAAAABhCiAAAADCBEAQAAGECIAgAAMIAQBQAAYAAhCgAAwABCFAAAgAGEKAAAAAMIUQAAAAYQogAAAAwgRAEAABhAiAIAADCAEAUAAGAAIQoAAMAAQhQAAIABhCgAAAADCFEAAAAGEKIAAAAMIEQBAAAYQIgCAAAwgBAFAABgACEKAADAAEIUAACAAYQoAAAAAwhRAAAABhCiAAAADPCo7QIq6vjx45o/f76+++47Xbx4Uc2bN1e/fv309NNPy8fHp8Lj5OXlqXPnzrJaraX2iY+PV1hYWFWUDQAA6iinCFH79+/XY489ppycHHXs2FFhYWH6/vvvtXDhQiUlJWn58uVq0KBBhcZKS0uT1WpVYGCgunTpUmIfX1/fKqweAADURQ4fogoKCjRp0iTl5OTojTfe0MMPPyzp5ozS888/r61bt2r27Nl65ZVXKjReamqqJGnAgAGaNGlStdUNAADqNodfE/XVV1/p1KlTioiIsAcoSfLy8tKbb74pHx8frVq1SleuXKnQeIUhisd1AACgMhw+RG3dulWS1Ldv32JtjRs3Vrdu3WSxWPTtt99WaDxCFAAAqAoOH6LS09MlSaGhoSW2t23bVtLNtU7lyc/Pl9lslq+vr3bt2qXhw4era9eu6tKli5544gnt2LGj6goHAAB1msOHqPPnz0uSmjVrVmJ706ZNi/QrS1pamiwWi7KzszV16lRJUrdu3eTv76+dO3dqzJgxWrRoURVVDgAA6jKHX1iek5Mj6eYaqJIUHi/sV5aDBw9Kuhm8YmNjizzSS0hI0Msvv6zZs2crPDxc99xzT2VLBwAAdZjDhyh3d3fduHFDJpOpzH42m63csaKjo9WzZ0+5u7sXm9kaNGiQUlNTtXTpUsXFxRGiAABAmRz+cV79+vUlSbm5uSW25+XlSZK8vb3LHcvNzU2BgYGlPhrs06ePJOnAgQNGSgUAAC7E4UNU4ZqnrKysEtsL10IV9quM5s2bSyo9sAEAABRy+BBV+Km8o0ePltheeLy0T+/9WmxsrCZOnKhdu3aV2J6RkSHplzAFAABQGocPUZGRkZKkjRs3Fmu7fPmydu/erXr16qlHjx7ljnX8+HFt3LhRCQkJJbYXHu/Vq5fxggEAgEtw+BDVt29fBQYGKjk5WcuWLbMfz8vL08svv6ycnBw9/PDD8vf3t7dZLBaZzWaZzWZZLBb78ZEjR8pkMikxMVGJiYlF3icuLk7r1q2Tr6+vHn300eq/MAAA4NQc/tN5Xl5eeuuttzR27FjNmDFDa9asUXBwsPbu3avz58/rzjvv1OTJk4uck5mZqf79+0uStmzZouDgYElSeHi4Jk2apFmzZunFF1/UkiVL1LJlSx05ckTHjh2Tj4+P5s+fLz8/vxq/TgAA4FwcfiZKurkh5urVq/XnP/9ZZ8+e1bZt29SwYUM988wzWrp0qRo0aFDhscaOHatPP/1UkZGRysjI0NatW5WXl6fo6Ght2LBBXbt2rcYrAQAAdYXDz0QVateunebMmVOhvsHBwTp8+HCp7REREYqIiKiq0gAAgAtyipkoAAAAR0OIAgAAMIAQBQAAYAAhCgAAwABCFAAAgAGEKAAAAAMIUQAAAAYQogAAAAwgRAEAABhAiAIAADCAEAUAAGAAIQoAAMAAQhQAAIABhCgAAAADCFEAAAAGEKIAAAAMIEQBAAAYQIgCAAAwgBAFAABgACEKAADAAEIUAACAAYQoAAAAAwhRAAAABhCiAAAADCBEAQAAGECIAgAAMIAQBQAAYAAhCgAAwABCFAAAgAGEKAAAAAMIUQAAAAYQogAAAAwgRAEAABjgUdsFVNTx48c1f/58fffdd7p48aKaN2+ufv366emnn5aPj88tjZWZmanY2Fjt3LlTGRkZ8vf3V1RUlGJiYtSkSZNqugIAAFCXOMVM1P79+zV48GCtX79e/v7+6tWrl3JycrRw4UINHz5c169fr/BYp06d0pAhQ7RixQp5eXmpd+/ecnd31+eff65BgwYpIyOjGq8EAADUFQ4fogoKCjRp0iTl5OTojTfe0OrVqzVnzhxt3rxZUVFROnz4sGbPnl3h8aZMmaKsrCzFxMRo/fr1mjNnjjZu3Kjhw4crIyND06dPr8arAQAAdYXDh6ivvvpKp06dUkREhB5++GH7cS8vL7355pvy8fHRqlWrdOXKlXLHSklJUUpKilq1aqXx48fbj7u7u2vatGkKDAzUtm3bdPTo0Wq5FgAAUHc4fIjaunWrJKlv377F2ho3bqxu3brJYrHo22+/rfBYffr0kZtb0UuvV6+eoqKiJElbtmypbNkAAKCOc/iF5enp6ZKk0NDQEtvbtm2rpKQkpaWl6cEHH6zQWO3atSt1LElKS0szWm6V8PZ0+NtSp9X0f3/ud+3ifrsW7rdrqe7//g5/d8+fPy9JatasWYntTZs2LdKvpsa6FR4ebvL1rfgnCDe/Gl2l7w/jbuW+GcX9dhzcb9fC/XYtt3K/PTwq9qDO4UNUTk6OpJtroEpSeLywX0XG8vb2rvRYt8JkMqlePfcqHRM1g/vmWrjfroX77Vqq4347/Jood/ebF20ymcrsZ7PZanQsAADg2hw+RNWvX1+SlJubW2J7Xl6epNJnl6prLAAA4NocPkQVrlPKysoqsb1w/VJhv5oaCwAAuDaHD1GFn8orbe+mwuOlfXqvusYCAACuzeFDVGRkpCRp48aNxdouX76s3bt3q169eurRo0eFx9q0aZNu3LhRpM1isdj3h+rdu3dlywYAAHWcw4eovn37KjAwUMnJyVq2bJn9eF5enl5++WXl5OTo4Ycflr+/v73NYrHIbDbLbDbLYrHYj3fu3FlhYWEym82aPXu2fQG51WrVG2+8oXPnzqlnz57q0KFDzV0gAABwSiabE3wUbffu3Ro7dqzy8vL0+9//XsHBwdq7d6/Onz+vO++8U0uXLlWDBg3s/U+fPq0+ffpIurn7eHBwsL3tyJEjeuSRR5Sdna077rhDISEhOnTokE6ePKmgoCB98cUXpe4jBQAAUMjhZ6IkqVu3blq9erX+/Oc/6+zZs9q2bZsaNmyoZ555pliAKk9ISIjWrl2rwYMH69q1a0pKSpIkjR49WqtWrSJAAQCACnGKmSgAAABH4xQzUQAAAI6GEAUAAGAAIQoAAMAAQhQAAIABhCgAAAADCFEAAAAGEKIAAAAMIEQBAGoE2xKirvGo7QJQe+bOnat58+bd0jnjx4+XJM2bN0+jRo3SK6+8Uh2loZb9+quTytOwYUOlpKSU22/06NHas2eP/vnPf+qBBx6obImoAtevX9fy5cu1detWnThxQtevX1ejRo3UunVrRUZGasSIEWrYsGGVvNfWrVv1+eef6+OPP66S8VC+mry/rooQ5cJCQ0M1YMCAIsdyc3O1efNmSSrWVnjO4cOHa6Q+OIaS/h78mo+PTw1VgqqUnp6uJ554QhcuXFBAQIA6duwoHx8fZWVlKS0tTSkpKfr444+1aNEidezYsVLvlZaWpr/97W8KCgqqoupRnpq8v66MEOXC7r//ft1///1Fjp0+fdoeot57770SzyNEuZbS/h7cqrffflu5ubl8P6UDsFqtmjBhgi5cuKCJEyfqb3/7m9zcflndcf36dc2cOVPx8fF6+umntXnzZtWvX9/w+/EYr2bV9P11ZayJAlAjAgMD1aZNm1v6wnBUj++++04nTpxQu3btFBMTU+QXrCQ1aNBAM2bM0B133KFLly5p48aNtVQpjOD+1hxCFCrl+++/11NPPaWuXbuqU6dOGjJkiBISEor1Gz16tEJDQ/Xvf/+7WNvp06cVGhqq8PDwIsejoqJ055136tSpUxo1apTuuusu9ejRQ/Hx8dV2PTBu7ty5Cg0N1Zdffqm3335bXbt2VXh4uMaNGyep7L8DqFkXL14st4+7u7vGjBmjwYMHy8/Pr1j79u3bNX78ePXs2VN33XWXwsPD9Ze//EXvvPOOLl++bO83ZcoUDRw4UJJ05swZhYaGKioqqsquBcVV5v7u3r1boaGhevDBB0s8b8qUKQoNDdWSJUvsx9auXavQ0FDFxsbqyJEjmjhxorp3766wsDA9+OCDWrJkiQoKCip/YQ6Ix3kwbMeOHVqxYoWaN2+u7t276+zZs/rPf/6jKVOmKCsrS2PHjq30e9hsNj311FPKyclRr169lJqaqrCwsCqoHtXlww8/1MmTJ9WjRw9dvXpVrVu3ru2S8BsdOnSQyWRSenq6Xn/9dT399NMKCAgo1m/o0KEaOnRosePvvfeeFi9eLA8PD3Xu3FmdOnVSVlaW9u3bp6NHj+r//u//tHbtWnl6eio8PFyXLl3S9u3b5ePjoz59+qhJkyY1cZkuq7L316h9+/bpww8/VP369dWpUyddv35dKSkpeuedd3T8+HG9/vrrVfZejoIQBcNOnDihJ598UpMnT7ZPF3/44YeaPXu2lixZoieffFLu7u6Veo8bN25Ikr7++ms1aNBAN27cKDY1Dcdy7NgxLVq0SJGRkZJ+uYdwHK1atdLIkSO1bNkyLV26VMuWLdNdd92lrl27qkuXLurSpYsaN25c4rlpaWn66KOP1KhRI61YsUJt2rSxtx05ckTDhw/XkSNHtHPnTvXq1UvDhg1Tx44dtX37djVu3LjK1tihdJW5v5Wxbds2DRgwQDNmzLB/4GTTpk0aP3684uPj9eyzz5YY5pwZIQqGBQUFFQlQkvTEE0/on//8p7Kzs3Xu3DkFBwdX+n2GDBliX0dDgKp5oaGhpbbde++9Wrp0aZFjd9xxhz1ASdwzRzVt2jS1bNlSsbGxys7O1v79+7V//359/PHHMplMuvvuuzV8+HANHDhQJpPJfl52drb+/Oc/Kzw8vEiAkqSQkBB1795dmzdv1unTp2v6kvArRu9vZXh7e+u1114r8ondvn37Kjg4WKdPn9bhw4cJUUChTp06FfsF6enpKX9/f2VmZuratWtV8j4dOnSoknFgTFlbHPz2l6jE/XIWbm5ueuyxxzRixAjt2rVLycnJ+u6775SWliar1aoffvhBP/zwgxISErRw4UL7L8bu3bure/fuRca6ceOGzpw5o9TUVHt4ys/Pr/Frwi+M3t/KaN++fYmf8mvatKlOnz6t3NzcSr+HoyFEwbBGjRqVeNzD4+Zfq6paSHj77bdXyTgw5lYfv5T29wKOydPTU5GRkfbZw+vXr2vPnj1KSEjQN998o927d+vtt9/Wa6+9Zj8nPz9f//rXv7Rx40YdO3ZMZ8+etYemwlkNtjVwDEbur1Hl/U6wWq2Vfg9Hwzw7DKuqxzTlrZmpqqlm1Awe3zm+tLQ07dq1q8TZogYNGigqKkpz587VlClTJEmJiYn29osXL2rgwIH6+9//rh07dsjPz0+DBg3StGnTtGbNGj300EM1dh0oWWXub3nKCkKu+LOan3aoEYX/c5UUmLKzs2u4GsC1Pfnkk3r88ce1f//+MvtFR0dLknJycpSXlydJmj17tsxmsyIiIvTtt99q+fLlmjFjhkaPHq277rpLV69erfb6UbbK3N/CfwSVFpauXLlShZU6P0IUakTh8/YLFy4Ua9u7d29NlwO4tC5dukhSud9jd+zYMUk3P+3l5eUl6ebecJL0+OOPF3vUfv36dfv/z7/+B5MrzlDUpsrc38Kf1ZcuXSr2SNZiseg///lPVZfr1AhRqBHt27eXJK1evVrXr1+3H09NTdXChQtrqyzAJT3zzDO67bbbtGXLFr3wwgvKzMws1uc///mPJk+eLEn2DVMl2T8av3nz5iK/ZC9duqRnn33WPrP8888/29s8PT0l3QxZbHlR/Spzf1u3bi1PT09lZ2dr7dq19uMWi0Wvv/56hTbydCUsLEeNGD58uL744gulp6fr/vvvV+fOnXX58mV9//33ioyM1A8//FDkhy6A6tO+fXvNmzdPL774ojZs2KB//etf6tChg4KCgnTjxg0dP35cZrNZbm5uGj9+vAYNGmQ/d8yYMfr++++1evVqfffddwoJCVF2drb27t2r/Px8hYSE6MiRI0VmnVu0aKHbbrtNV65c0fDhw/Vf//Vf7BdVjSpzf318fDR69GgtWbJE//jHPxQfHy8/Pz/98MMPunr1qh588EFt2LChFq/OsTAThRrRvHlzrVq1yv5x+e3bt+vy5cuaPHmy5s2bV+lNOQHcmp49e+qbb77RCy+8oHvvvVdZWVnavn27du7cKavVqmHDhmnNmjWaMGFCkfP+9Kc/6bPPPtMf/vAHXblyRdu3b1dmZqbuu+8+ffrpp3r33XclSUlJSfZZJ29vb7377rtq1aqVDh48qB07dhT5ahhUPaP3V5JefPFFvfrqq7rzzjuVmpqqPXv2qFOnToqPj1fXrl1r4Wocl8nG51ABAABuGTNRAAAABhCiAAAADCBEAQAAGECIAgAAMIAQBQAAYAAhCgAAwABCFAAAgAGEKAAAAAMIUQAAAAYQogDgV06fPq3Q0FCFhobqxx9/rO1yADgwQhQAAIABhCgAAAADCFEAAAAGEKIAAAAM8KjtAgCgNhw8eFBLlixRSkqKLl++rJYtW2r48OHq2bNnqefs2bNHq1ev1t69e3XhwgUVFBSocePG6tSpk0aOHKmIiAh731mzZmnRokVq166d1q9fX+J4KSkpGjVqlBo1aqTk5GTddtttVX6dAKoPM1EAXE5iYqKio6O1YcMG5ebmKiQkRFlZWZoxY4b+8Y9/lHjOrFmzNHr0aCUmJuqnn37SHXfcocDAQF26dEnffPONHn/8ca1cudLef8iQIZKk9PR0paWllTjmunXrJEl/+ctfCFCAEyJEAXApp06d0rRp02SxWPTEE08oOTlZa9as0Y4dO/TCCy9oz549xc7ZvXu3Fi1aJDc3N7355pvasWOH1q5dq2+++UZbtmzRvffeK0maM2eObty4IUlq1aqVunTpIumXsPRrP//8s77++mtJ0uDBg6vrcgFUI0IUAJfy0Ucf6eeff9a9996rKVOmyNPTU5Lk7u6usWPHlhhovv32W3l6eqpv374aMmSI3Nx++dHZvHlzPfvss5KkCxcu6OLFi/a2wtmo9evXy2q1Fhlz8+bNunbtmkJCQtSxY8cqv04A1Y8QBcClbN++XVLpsz8jRowodmzy5Mnav3+/3n333RLP8fLysv85Ly/P/ud+/frJx8dHWVlZ2rlzZ5FzEhISyqwDgONjYTkAl5GXl6dz585JkkJCQkrs0759e5lMJtlstiLHTSaT3NzclJKSoqNHj+rUqVM6efKkDh8+XGRn88LHeZLk4+Ojfv36ac2aNVq3bp3uu+8+SbKHKg8PDz300ENVfZkAagghCoDLuHLliv3PPj4+Jfbx9PSUt7e3cnJy7MdsNps+++wzLVmyROfPn7cfN5lMat26tf77v/+7xHVP0s1HemvWrNHmzZv1008/qX79+kpMTJTValVUVJT8/f2r6OoA1DRCFACX0bhxY/ufr1+/XmIfm82m/Pz8Isfmz5+vuXPnSpL69++vnj17qm3btrrjjjtUv359nThxotQQ1aVLF7Vu3VrHjx/X5s2biwSuwjVTAJwTa6IAuAxPT08FBQVJkg4dOlRin2PHjqmgoMD+2mKxaMmSJZKkmJgYvf/++xo0aJDCwsJUv359SVJGRkaZ71u47mnTpk06deqUDh8+rCZNmigyMrLS1wSg9hCiALiU+++/X5K0cuXKYp+Yk6TVq1cXeX358mX7o73f//73JY7563N+HcAKDRo0SB4eHkpOTtaGDRskSQ899JDq1atn7CIAOARCFACX8uSTT8rX11epqamaOnWq/bGezWbT8uXLFRcXV6R/kyZN5OvrK0n69NNPi6yrunTpkl599VV7MJKKfjqvUEBAgO677z7l5uZq0aJFkvhUHlAXmGy//QgKANRxu3bt0vjx43X9+nX5+PioTZs2ysjIUFZWlqKiorR9+3ZZrVZ98803atmypZYvX67XXntNkuTt7a1WrVopPz9fP/74owoKCnTnnXfq3Llzunz5smJjY9WnT59i77l582bFxMRIujmjtXbt2hq9ZgBVj5koAC4nIiJCCQkJGjZsmBo3bqzDhw/L29tbEyZM0Jw5c4r1HzlypD799FP16NFDDRs21JEjR3Tx4kXdfffdeuWVV7Rq1Sr7+qakpKQS37NXr172he0sKAfqBmaiAKAGXL58Wffdd59MJpOSk5N1++2313ZJACqJmSgAqAFffvmlLBaL7r//fgIUUEewTxQAVJP09HQ1aNBAP/zwg/0x4eOPP167RQGoMoQoAKgmb7/9tpKTk+2vo6OjFRYWVosVAahKhCgAqCbh4eHat2+f6tWrp4EDB2rSpEm1XRKAKsTCcgAAAANYWA4AAGAAIQoAAMAAQhQAAIABhCgAAAADCFEAAAAGEKIAAAAMIEQBAAAYQIgCAAAw4P8BWzi3MNSenygAAAAASUVORK5CYII=",
"text/plain": [
"
"
],
"text/plain": [
" Source SS DF MS F p-unc np2\n",
"0 day 7.446900 3.0 2.482300 1.298061 0.275785 0.016233\n",
"1 sex 1.594561 1.0 1.594561 0.833839 0.362097 0.003521\n",
"2 day * sex 2.785891 3.0 0.928630 0.485606 0.692600 0.006135\n",
"3 Residual 451.306151 236.0 1.912314 NaN NaN NaN"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Extend ANOVA\n",
"pg.anova(data=tips, between=['day', 'sex'], dv='tip')"
]
},
{
"cell_type": "markdown",
"id": "261e294d-79fa-4c0a-9515-c12c275f4819",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"source": [
"Further cases can be handled such as:\n",
"- Analysis of Covariance, ANCOVA - 'controlling' for a variable before the ANOVA is run, `pg.ancova`.\n",
"- Repeated measures ANOVA - analysing fully repeated measures, all within-participants, `pg.rm_anova`.\n",
"- Mixed ANOVA - analysing data with one variable measured between, and another within, `pg.mixed_anova`.\n",
"\n",
"The latter two will required your data in **long-form**. `\n",
"pingouin` has loads of other options you can try too for many things outside of ANOVA."
]
},
{
"cell_type": "markdown",
"id": "47e4e28e-0b24-49d1-b660-ed1964f12324",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"### Part 2 - `statsmodels` for general linear models\n",
"Now for the main course. This approach will serve us for most of the rest of the module, and serve you well for the future!\n",
"\n",
"`statsmodels` is a package for building statistical models, and gives us full flexibility in building our models. It works in conjunction with `pandas` dataframes, integrating them with a *formula string* interface that lets us specify our model structure in a simple way. \n",
"\n",
"First, lets import it."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "3ab8786c-d87b-4775-a115-5a4c56da3c93",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [],
"source": [
"# Import statsmodels formula interface\n",
"import statsmodels.formula.api as smf \n",
"# looks a little different due to structure of package"
]
},
{
"cell_type": "markdown",
"id": "677c6345-26d6-4796-96ce-8563a90ecd2d",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"source": [
"Let's build a model that predicts tip amount from the bill and number of diners. To do this, we use `statsmodels` `ols` function, pass the string that defines the model, and the data, and tell it `.fit()`. A lot of steps, but simple ones:\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "ab4ec5c5-d1a4-415b-9e04-0760ab96efbc",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [],
"source": [
"# Build and fit our model\n",
"first_model = smf.ols('tip ~ 1 + total_bill + size', data=tips).fit()"
]
},
{
"cell_type": "markdown",
"id": "a63a6b78-60b2-491c-8561-8168e536c230",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"source": [
"Things to note:\n",
"- The names of the included variables are the column names of the dataframe.\n",
"- The DV is on the left and separated with `~`, which can be read as 'is predicted/modelled by'.\n",
"- The `1` means 'fit an intercept`. We almost always want this.\n",
"- We include predictors by literally adding them to the model with a plus.\n",
"\n",
"Once we have fit this model, we can ask it to provide us with a `.summary()`, which gives the readout of information in the regression."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "10796c9d-a9c0-435f-8864-ef9450fbf454",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
OLS Regression Results
\n",
"
\n",
"
Dep. Variable:
tip
R-squared:
0.468
\n",
"
\n",
"
\n",
"
Model:
OLS
Adj. R-squared:
0.463
\n",
"
\n",
"
\n",
"
No. Observations:
244
F-statistic:
105.9
\n",
"
\n",
"
\n",
"
Covariance Type:
nonrobust
Prob (F-statistic):
9.67e-34
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
coef
std err
t
P>|t|
[0.025
0.975]
\n",
"
\n",
"
\n",
"
Intercept
0.6689
0.194
3.455
0.001
0.288
1.050
\n",
"
\n",
"
\n",
"
total_bill
0.0927
0.009
10.172
0.000
0.075
0.111
\n",
"
\n",
"
\n",
"
size
0.1926
0.085
2.258
0.025
0.025
0.361
\n",
"
\n",
"
Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/latex": [
"\\begin{center}\n",
"\\begin{tabular}{lclc}\n",
"\\toprule\n",
"\\textbf{Dep. Variable:} & tip & \\textbf{ R-squared: } & 0.468 \\\\\n",
"\\textbf{Model:} & OLS & \\textbf{ Adj. R-squared: } & 0.463 \\\\\n",
"\\textbf{No. Observations:} & 244 & \\textbf{ F-statistic: } & 105.9 \\\\\n",
"\\textbf{Covariance Type:} & nonrobust & \\textbf{ Prob (F-statistic):} & 9.67e-34 \\\\\n",
"\\bottomrule\n",
"\\end{tabular}\n",
"\\begin{tabular}{lcccccc}\n",
" & \\textbf{coef} & \\textbf{std err} & \\textbf{t} & \\textbf{P$> |$t$|$} & \\textbf{[0.025} & \\textbf{0.975]} \\\\\n",
"\\midrule\n",
"\\textbf{Intercept} & 0.6689 & 0.194 & 3.455 & 0.001 & 0.288 & 1.050 \\\\\n",
"\\textbf{total\\_bill} & 0.0927 & 0.009 & 10.172 & 0.000 & 0.075 & 0.111 \\\\\n",
"\\textbf{size} & 0.1926 & 0.085 & 2.258 & 0.025 & 0.025 & 0.361 \\\\\n",
"\\bottomrule\n",
"\\end{tabular}\n",
"%\\caption{OLS Regression Results}\n",
"\\end{center}\n",
"\n",
"Notes: \\newline\n",
" [1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/plain": [
"\n",
"\"\"\"\n",
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: tip R-squared: 0.468\n",
"Model: OLS Adj. R-squared: 0.463\n",
"No. Observations: 244 F-statistic: 105.9\n",
"Covariance Type: nonrobust Prob (F-statistic): 9.67e-34\n",
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.6689 0.194 3.455 0.001 0.288 1.050\n",
"total_bill 0.0927 0.009 10.172 0.000 0.075 0.111\n",
"size 0.1926 0.085 2.258 0.025 0.025 0.361\n",
"==============================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
"\"\"\""
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Show the summary of the fitted model\n",
"first_model.summary(slim=True)\n",
"# slim=False has more, but less relevant, info"
]
},
{
"cell_type": "markdown",
"id": "b114ceba-d5eb-46b1-ab57-6a587a5ba48c",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"source": [
"The `formula` interface is very flexible and allows you to alter the variables in the dataframe by modifying the formula. For example, if we want to z-score our predictors, we can use the `scale` function directly in the formula, and it will adjust the variables for us:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "d6f6956d-4c2a-4ae2-ad50-09db6211956f",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
OLS Regression Results
\n",
"
\n",
"
Dep. Variable:
tip
R-squared:
0.468
\n",
"
\n",
"
\n",
"
Model:
OLS
Adj. R-squared:
0.463
\n",
"
\n",
"
\n",
"
No. Observations:
244
F-statistic:
105.9
\n",
"
\n",
"
\n",
"
Covariance Type:
nonrobust
Prob (F-statistic):
9.67e-34
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
coef
std err
t
P>|t|
[0.025
0.975]
\n",
"
\n",
"
\n",
"
Intercept
2.9983
0.065
46.210
0.000
2.870
3.126
\n",
"
\n",
"
\n",
"
scale(total_bill)
0.8237
0.081
10.172
0.000
0.664
0.983
\n",
"
\n",
"
\n",
"
scale(size)
0.1828
0.081
2.258
0.025
0.023
0.342
\n",
"
\n",
"
Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/latex": [
"\\begin{center}\n",
"\\begin{tabular}{lclc}\n",
"\\toprule\n",
"\\textbf{Dep. Variable:} & tip & \\textbf{ R-squared: } & 0.468 \\\\\n",
"\\textbf{Model:} & OLS & \\textbf{ Adj. R-squared: } & 0.463 \\\\\n",
"\\textbf{No. Observations:} & 244 & \\textbf{ F-statistic: } & 105.9 \\\\\n",
"\\textbf{Covariance Type:} & nonrobust & \\textbf{ Prob (F-statistic):} & 9.67e-34 \\\\\n",
"\\bottomrule\n",
"\\end{tabular}\n",
"\\begin{tabular}{lcccccc}\n",
" & \\textbf{coef} & \\textbf{std err} & \\textbf{t} & \\textbf{P$> |$t$|$} & \\textbf{[0.025} & \\textbf{0.975]} \\\\\n",
"\\midrule\n",
"\\textbf{Intercept} & 2.9983 & 0.065 & 46.210 & 0.000 & 2.870 & 3.126 \\\\\n",
"\\textbf{scale(total\\_bill)} & 0.8237 & 0.081 & 10.172 & 0.000 & 0.664 & 0.983 \\\\\n",
"\\textbf{scale(size)} & 0.1828 & 0.081 & 2.258 & 0.025 & 0.023 & 0.342 \\\\\n",
"\\bottomrule\n",
"\\end{tabular}\n",
"%\\caption{OLS Regression Results}\n",
"\\end{center}\n",
"\n",
"Notes: \\newline\n",
" [1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/plain": [
"\n",
"\"\"\"\n",
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: tip R-squared: 0.468\n",
"Model: OLS Adj. R-squared: 0.463\n",
"No. Observations: 244 F-statistic: 105.9\n",
"Covariance Type: nonrobust Prob (F-statistic): 9.67e-34\n",
"=====================================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"-------------------------------------------------------------------------------------\n",
"Intercept 2.9983 0.065 46.210 0.000 2.870 3.126\n",
"scale(total_bill) 0.8237 0.081 10.172 0.000 0.664 0.983\n",
"scale(size) 0.1828 0.081 2.258 0.025 0.023 0.342\n",
"=====================================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
"\"\"\""
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Scale predictors\n",
"scale_predictors = smf.ols('tip ~ 1 + scale(total_bill) + scale(size)', \n",
" data=tips).fit()\n",
"scale_predictors.summary(slim=True)"
]
},
{
"cell_type": "markdown",
"id": "48ac5416-3206-41b9-84fa-393af7ca6148",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"source": [
"Extending to the DV is also simple - this is the 'standardised coefficients' you see in some statistics software:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "f8e1eca1-e94e-46f4-adae-a770f73d2e05",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
OLS Regression Results
\n",
"
\n",
"
Dep. Variable:
scale(tip)
R-squared:
0.468
\n",
"
\n",
"
\n",
"
Model:
OLS
Adj. R-squared:
0.463
\n",
"
\n",
"
\n",
"
No. Observations:
244
F-statistic:
105.9
\n",
"
\n",
"
\n",
"
Covariance Type:
nonrobust
Prob (F-statistic):
9.67e-34
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
coef
std err
t
P>|t|
[0.025
0.975]
\n",
"
\n",
"
\n",
"
Intercept
-3.851e-16
0.047
-8.2e-15
1.000
-0.093
0.093
\n",
"
\n",
"
\n",
"
scale(total_bill)
0.5965
0.059
10.172
0.000
0.481
0.712
\n",
"
\n",
"
\n",
"
scale(size)
0.1324
0.059
2.258
0.025
0.017
0.248
\n",
"
\n",
"
Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/latex": [
"\\begin{center}\n",
"\\begin{tabular}{lclc}\n",
"\\toprule\n",
"\\textbf{Dep. Variable:} & scale(tip) & \\textbf{ R-squared: } & 0.468 \\\\\n",
"\\textbf{Model:} & OLS & \\textbf{ Adj. R-squared: } & 0.463 \\\\\n",
"\\textbf{No. Observations:} & 244 & \\textbf{ F-statistic: } & 105.9 \\\\\n",
"\\textbf{Covariance Type:} & nonrobust & \\textbf{ Prob (F-statistic):} & 9.67e-34 \\\\\n",
"\\bottomrule\n",
"\\end{tabular}\n",
"\\begin{tabular}{lcccccc}\n",
" & \\textbf{coef} & \\textbf{std err} & \\textbf{t} & \\textbf{P$> |$t$|$} & \\textbf{[0.025} & \\textbf{0.975]} \\\\\n",
"\\midrule\n",
"\\textbf{Intercept} & -3.851e-16 & 0.047 & -8.2e-15 & 1.000 & -0.093 & 0.093 \\\\\n",
"\\textbf{scale(total\\_bill)} & 0.5965 & 0.059 & 10.172 & 0.000 & 0.481 & 0.712 \\\\\n",
"\\textbf{scale(size)} & 0.1324 & 0.059 & 2.258 & 0.025 & 0.017 & 0.248 \\\\\n",
"\\bottomrule\n",
"\\end{tabular}\n",
"%\\caption{OLS Regression Results}\n",
"\\end{center}\n",
"\n",
"Notes: \\newline\n",
" [1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/plain": [
"\n",
"\"\"\"\n",
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: scale(tip) R-squared: 0.468\n",
"Model: OLS Adj. R-squared: 0.463\n",
"No. Observations: 244 F-statistic: 105.9\n",
"Covariance Type: nonrobust Prob (F-statistic): 9.67e-34\n",
"=====================================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"-------------------------------------------------------------------------------------\n",
"Intercept -3.851e-16 0.047 -8.2e-15 1.000 -0.093 0.093\n",
"scale(total_bill) 0.5965 0.059 10.172 0.000 0.481 0.712\n",
"scale(size) 0.1324 0.059 2.258 0.025 0.017 0.248\n",
"=====================================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
"\"\"\""
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Scale predictors\n",
"scale_all = smf.ols('scale(tip) ~ 1 + scale(total_bill) + scale(size)',\n",
" data=tips).fit()\n",
"scale_all.summary(slim=True)"
]
},
{
"cell_type": "markdown",
"id": "ca6c64fd-2357-48f8-86c6-247ae7655bf4",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"#### Extra scores and attributes\n",
"While the `.summary()` printout has a lot of useful features, there are a few other things to know about. \n",
"- the `.fittedvalues` attribute contains the actual predictions.\n",
"- the `.resid` attribute contains the residuals or errors.\n",
"- `.summary()` doesn't show us the RMSE, or how wrong our model is on average. We can import a function from `statsmodels` to do that for us, which we do next.\n",
"\n",
"We need to import the `import statsmodels.tools.eval_measures` package, and use its `.rmse` function. We need the predictions and the actual data, which is easy to access!"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "2c8ec745-f13b-4c25-a29c-e11ac28340aa",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"1.007256127114662"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Import it with the meausres name\n",
"import statsmodels.tools.eval_measures as measures\n",
"measures.rmse(first_model.fittedvalues, tips['tip']) \n",
"# notice we put in the predictions, and then the actual values"
]
},
{
"cell_type": "markdown",
"id": "62d995e0-89f3-44af-8d72-19e30edaeb5d",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"### Part 3 - Wait, its just the general linear model?\n",
"Now we've seen how easy it is to fit models, we can demonstrate some equivalences between standard statistics and how they are instances of the general linear model. First, a correlation.\n",
"\n",
"A correlation is a general linear model, when you have **one predictor**, and you **standardise both the predictor and the dependent variable**.\n",
"\n",
"Lets re-correlate tips and total bill:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "9ff31dc0-034e-44fc-aae9-8e821f581431",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
n
\n",
"
r
\n",
"
CI95%
\n",
"
p-val
\n",
"
BF10
\n",
"
power
\n",
"
\n",
" \n",
" \n",
"
\n",
"
pearson
\n",
"
244
\n",
"
0.675734
\n",
"
[0.6, 0.74]
\n",
"
6.692471e-34
\n",
"
4.952e+30
\n",
"
1.0
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" n r CI95% p-val BF10 power\n",
"pearson 244 0.675734 [0.6, 0.74] 6.692471e-34 4.952e+30 1.0"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Correlate\n",
"pg.corr(tips['tip'], tips['total_bill'])"
]
},
{
"cell_type": "markdown",
"id": "80f8a602-0095-4ff8-a39e-63ceafe488ad",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"source": [
"Now lets perform a linear model, scaling both the DV and predictor (which is which does not matter):"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "7bb7d7a0-e0dc-4a1c-9cf0-cff202868e5c",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
OLS Regression Results
\n",
"
\n",
"
Dep. Variable:
scale(total_bill)
R-squared:
0.457
\n",
"
\n",
"
\n",
"
Model:
OLS
Adj. R-squared:
0.454
\n",
"
\n",
"
\n",
"
No. Observations:
244
F-statistic:
203.4
\n",
"
\n",
"
\n",
"
Covariance Type:
nonrobust
Prob (F-statistic):
6.69e-34
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
coef
std err
t
P>|t|
[0.025
0.975]
\n",
"
\n",
"
\n",
"
Intercept
7.286e-16
0.047
1.54e-14
1.000
-0.093
0.093
\n",
"
\n",
"
\n",
"
scale(tip)
0.6757
0.047
14.260
0.000
0.582
0.769
\n",
"
\n",
"
Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/latex": [
"\\begin{center}\n",
"\\begin{tabular}{lclc}\n",
"\\toprule\n",
"\\textbf{Dep. Variable:} & scale(total\\_bill) & \\textbf{ R-squared: } & 0.457 \\\\\n",
"\\textbf{Model:} & OLS & \\textbf{ Adj. R-squared: } & 0.454 \\\\\n",
"\\textbf{No. Observations:} & 244 & \\textbf{ F-statistic: } & 203.4 \\\\\n",
"\\textbf{Covariance Type:} & nonrobust & \\textbf{ Prob (F-statistic):} & 6.69e-34 \\\\\n",
"\\bottomrule\n",
"\\end{tabular}\n",
"\\begin{tabular}{lcccccc}\n",
" & \\textbf{coef} & \\textbf{std err} & \\textbf{t} & \\textbf{P$> |$t$|$} & \\textbf{[0.025} & \\textbf{0.975]} \\\\\n",
"\\midrule\n",
"\\textbf{Intercept} & 7.286e-16 & 0.047 & 1.54e-14 & 1.000 & -0.093 & 0.093 \\\\\n",
"\\textbf{scale(tip)} & 0.6757 & 0.047 & 14.260 & 0.000 & 0.582 & 0.769 \\\\\n",
"\\bottomrule\n",
"\\end{tabular}\n",
"%\\caption{OLS Regression Results}\n",
"\\end{center}\n",
"\n",
"Notes: \\newline\n",
" [1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/plain": [
"\n",
"\"\"\"\n",
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: scale(total_bill) R-squared: 0.457\n",
"Model: OLS Adj. R-squared: 0.454\n",
"No. Observations: 244 F-statistic: 203.4\n",
"Covariance Type: nonrobust Prob (F-statistic): 6.69e-34\n",
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"Intercept 7.286e-16 0.047 1.54e-14 1.000 -0.093 0.093\n",
"scale(tip) 0.6757 0.047 14.260 0.000 0.582 0.769\n",
"==============================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
"\"\"\""
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Linear model\n",
"smf.ols('scale(total_bill) ~ scale(tip)', \n",
" data=tips).fit().summary(slim=True)"
]
},
{
"cell_type": "markdown",
"id": "34679c6e-408c-49a2-92a7-70cb1c46088e",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "fragment"
},
"tags": []
},
"source": [
"The intercept is as close to zero as our computer can handle, and the coefficient equals the correlation!"
]
},
{
"cell_type": "markdown",
"id": "27a1439c-919d-4ed9-b95d-5cc51d4fe079",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"source": [
"A t-test is also simple, comparing tips between females and males:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "787cae76-18d5-43a4-a5c6-645d98ff0026",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Raw data\n",
"axis = sns.stripplot(data=tips, y='tip', x='sex', alpha=.3)\n",
"# Adds the means of each sex\n",
"sns.pointplot(data=tips, y='tip', x='sex', \n",
" linestyle='none', color='black', ax=axis, zorder=3)\n",
"# Add the predictions\n",
"sns.lineplot(x=female_one_male_zero,\n",
" y=ttest_model.fittedvalues,\n",
" color='black');"
]
},
{
"cell_type": "markdown",
"id": "1976623f-3469-465c-9f93-cb57cc18e2c8",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"source": [
"Finally, we will consider how a one-way ANOVA works, and demonstrate how a linear model 'sees' an ANOVA.\n",
"\n",
"Lets compare the means of tips given on the different days (Thursday, Friday, Saturday, Sunday). The traditional approach here is the one-way ANOVA, done like so:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "5c137180-2512-4401-b7d6-640d58f5aa60",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Source
\n",
"
ddof1
\n",
"
ddof2
\n",
"
F
\n",
"
p-unc
\n",
"
n2
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
day
\n",
"
3
\n",
"
240
\n",
"
1.672355
\n",
"
0.173589
\n",
"
0.020476
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Source ddof1 ddof2 F p-unc n2\n",
"0 day 3 240 1.672355 0.173589 0.020476"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# One way ANOVA\n",
"pg.anova(data=tips, between=['day'], dv='tip', effsize='n2')"
]
},
{
"cell_type": "markdown",
"id": "72f7f357-6830-473b-8125-92550eb03fa0",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"source": [
"Building this model as a GLM is conceptually very simple - we predict tips, from day. Lets fit this model and explore some consequences."
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "f29a1b13-ab27-4a12-a437-e2df1ab8a7a5",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
OLS Regression Results
\n",
"
\n",
"
Dep. Variable:
tip
R-squared:
0.020
\n",
"
\n",
"
\n",
"
Model:
OLS
Adj. R-squared:
0.008
\n",
"
\n",
"
\n",
"
No. Observations:
244
F-statistic:
1.672
\n",
"
\n",
"
\n",
"
Covariance Type:
nonrobust
Prob (F-statistic):
0.174
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
coef
std err
t
P>|t|
[0.025
0.975]
\n",
"
\n",
"
\n",
"
Intercept
2.7715
0.175
15.837
0.000
2.427
3.116
\n",
"
\n",
"
\n",
"
day[T.Fri]
-0.0367
0.361
-0.102
0.919
-0.748
0.675
\n",
"
\n",
"
\n",
"
day[T.Sat]
0.2217
0.229
0.968
0.334
-0.229
0.673
\n",
"
\n",
"
\n",
"
day[T.Sun]
0.4837
0.236
2.051
0.041
0.019
0.948
\n",
"
\n",
"
Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/latex": [
"\\begin{center}\n",
"\\begin{tabular}{lclc}\n",
"\\toprule\n",
"\\textbf{Dep. Variable:} & tip & \\textbf{ R-squared: } & 0.020 \\\\\n",
"\\textbf{Model:} & OLS & \\textbf{ Adj. R-squared: } & 0.008 \\\\\n",
"\\textbf{No. Observations:} & 244 & \\textbf{ F-statistic: } & 1.672 \\\\\n",
"\\textbf{Covariance Type:} & nonrobust & \\textbf{ Prob (F-statistic):} & 0.174 \\\\\n",
"\\bottomrule\n",
"\\end{tabular}\n",
"\\begin{tabular}{lcccccc}\n",
" & \\textbf{coef} & \\textbf{std err} & \\textbf{t} & \\textbf{P$> |$t$|$} & \\textbf{[0.025} & \\textbf{0.975]} \\\\\n",
"\\midrule\n",
"\\textbf{Intercept} & 2.7715 & 0.175 & 15.837 & 0.000 & 2.427 & 3.116 \\\\\n",
"\\textbf{day[T.Fri]} & -0.0367 & 0.361 & -0.102 & 0.919 & -0.748 & 0.675 \\\\\n",
"\\textbf{day[T.Sat]} & 0.2217 & 0.229 & 0.968 & 0.334 & -0.229 & 0.673 \\\\\n",
"\\textbf{day[T.Sun]} & 0.4837 & 0.236 & 2.051 & 0.041 & 0.019 & 0.948 \\\\\n",
"\\bottomrule\n",
"\\end{tabular}\n",
"%\\caption{OLS Regression Results}\n",
"\\end{center}\n",
"\n",
"Notes: \\newline\n",
" [1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/plain": [
"\n",
"\"\"\"\n",
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: tip R-squared: 0.020\n",
"Model: OLS Adj. R-squared: 0.008\n",
"No. Observations: 244 F-statistic: 1.672\n",
"Covariance Type: nonrobust Prob (F-statistic): 0.174\n",
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"Intercept 2.7715 0.175 15.837 0.000 2.427 3.116\n",
"day[T.Fri] -0.0367 0.361 -0.102 0.919 -0.748 0.675\n",
"day[T.Sat] 0.2217 0.229 0.968 0.334 -0.229 0.673\n",
"day[T.Sun] 0.4837 0.236 2.051 0.041 0.019 0.948\n",
"==============================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
"\"\"\""
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Fit the ANOVA style model\n",
"anova = smf.ols('tip ~ day', data=tips).fit()\n",
"anova.summary(slim=True)"
]
},
{
"cell_type": "markdown",
"id": "c6b4e472-c2a7-4f56-b5b7-5c6388d1f50b",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"source": [
"Note that the $R^2$ is the same, as is the F statistic and the p-value!\n",
"\n",
"The coefficients are interesting here. You can see they state Fri, Sat, and Sun. Where is Thursday? Much like in the t-test example, *Thursday is absorbed into intercept*. \n",
"Each coefficient represents the difference between the intercept and that particular day. When all the coefficient are zero (i.e., not that day) then the model returns the mean of Thursdays tips. Again we will focus on our models capabilities to predict things as we continue, but in these instances it is good to know what the coefficients mean before things get too complex.\n"
]
},
{
"cell_type": "markdown",
"id": "b505ace7-0c00-44a2-8260-228b92b724ed",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"### The design matrix\n",
"This is a technical aside but can be useful in gaining understanding as we move forward. Behind every model you build in `statsmodels`, there is something called the *design matrix*. The data you provide isn't always the data that the model is fitted to, but it is transformed in a way that OLS can use. For example, you can't do statistics on strings ('Thursday', 'Female', etc) - there is a transform that happens to make a numeric version of it. This is sometimes referred to as 'dummy coding', where a categorical variable will be spread into as many columns as there are levels of the variable, with a '1' representing a particular level and zero elsewhere. \n",
"\n",
"An example will help. Consider the following dataframe that has some simple categorical-style data."
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "93484d38-ae2a-41b0-9637-d9b3d3dae20f",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
categories
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Wales
\n",
"
\n",
"
\n",
"
1
\n",
"
Wales
\n",
"
\n",
"
\n",
"
2
\n",
"
England
\n",
"
\n",
"
\n",
"
3
\n",
"
Scotland
\n",
"
\n",
"
\n",
"
4
\n",
"
Ireland
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" categories\n",
"0 Wales\n",
"1 Wales\n",
"2 England\n",
"3 Scotland\n",
"4 Ireland"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Create a small categorical dataset\n",
"df = pd.DataFrame({'categories': ['Wales', 'Wales', 'England', \n",
" 'Scotland', 'Ireland']})\n",
"display(df)"
]
},
{
"cell_type": "markdown",
"id": "ee79dae4-c14c-4a16-ae35-768477118575",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"source": [
"We could imagine this is a predictor in a regression. In its current form, it won't work. However, we can import something called `patsy`, a package that does the translating of datasets for us for our models. It has a function called the `dmatrix` which will turn this data into what a model will see, that is, the design matrix. It works in much the same way as `smf.ols` does. Lets import it and demonstrate:"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "cb0a0518-ca67-496b-82e2-645884c8b353",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Intercept
\n",
"
categories[T.Ireland]
\n",
"
categories[T.Scotland]
\n",
"
categories[T.Wales]
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
1.0
\n",
"
0.0
\n",
"
0.0
\n",
"
1.0
\n",
"
\n",
"
\n",
"
1
\n",
"
1.0
\n",
"
0.0
\n",
"
0.0
\n",
"
1.0
\n",
"
\n",
"
\n",
"
2
\n",
"
1.0
\n",
"
0.0
\n",
"
0.0
\n",
"
0.0
\n",
"
\n",
"
\n",
"
3
\n",
"
1.0
\n",
"
0.0
\n",
"
1.0
\n",
"
0.0
\n",
"
\n",
"
\n",
"
4
\n",
"
1.0
\n",
"
1.0
\n",
"
0.0
\n",
"
0.0
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Intercept categories[T.Ireland] categories[T.Scotland] \\\n",
"0 1.0 0.0 0.0 \n",
"1 1.0 0.0 0.0 \n",
"2 1.0 0.0 0.0 \n",
"3 1.0 0.0 1.0 \n",
"4 1.0 1.0 0.0 \n",
"\n",
" categories[T.Wales] \n",
"0 1.0 \n",
"1 1.0 \n",
"2 0.0 \n",
"3 0.0 \n",
"4 0.0 "
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Import just the design matrix\n",
"from patsy import dmatrix\n",
"dmatrix('~ categories', data=df, return_type='dataframe') \n",
"# notice there's no DV but the formula is the same!"
]
},
{
"cell_type": "markdown",
"id": "bbd30423-96c2-445e-a069-b8745c2fb953",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"source": [
"You can see that it has chosen England to become the intercept (this is chosen alphabetically), and each column has a '1' where the title of that column exists in the original data. This representation is what Python uses to build our models. This is not essential to know but it will deepen your understanding."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}